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Abstract 

Lenstra's integer factorization algorithm is asymptotically one of the fastest known al- 
gorithms, and is also ideally suited for parallel computation. We suggest a way in which 
the algorithm can be speeded up by the addition of a second phase. Under some plausible 
assumptions, the speedup is of order log(p), where p is the factor which is found. In practice 
the speedup is significant. We mention some refinements which give greater speedup, an 
alternative way of implementing a second phase, and the connection with Pollard's "p — 1" 
factorization algorithm. 

1 Introduction 

Recently H.W. Lenstra Jr. proposed a new integer factorization algorithm, which we shall call 
"Lenstra's algorithm" or the "one-phase elliptic curve algorithm" [17]. Under some plausible 
assumptions Lenstra's algorithm finds a prime factor p of a large composite integer in expected 
time 



Ti{p) = exp (^1^(2 + o(l)) Inplnlnp j , (1.1) 

where "o(l)" means a term which tends to zero as p ^ oo. Previously algorithms with run- 
ning time exp (v/(l + o(l))lniVlnln iv) were known [27]. However, since p < N, Lenstra's 
algorithm is comparable in the worst case and often much better, since it often happens that 
21np <C In TV. 

The Brent-Pollard "rho" algorithm [5] is similar to Lenstra's algorithm in that its expected 
running time depends on p, in fact it is of order p^^"^. Asymptotically Ti{p) <^ p^^"^, but because 
of the overheads associated with Lenstra's algorithm we expect the "rho" algorithm to be faster 
if p is sufficiently small. The results of §8 suggest how large p has to be before Lenstra's algo- 
rithm is faster. 
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After some preliminaries in §§2-4, we describe Lcnstra's algorithm in §5, and outline the 
derivation of (1.1). In §6 and §7 we describe how Lcnstra's algorithm can be speeded up by 
the addition of a second phase which is based on the same idea as the well-known "paradox" 
concerning the probability that two people at a party have the same birthday [25]. The two- 
phase algorithm has expected running time 0{Ti(p)/ liip). In practice, for p around 10^", the 
"birthday paradox algorithm" is about 4 times faster than Lcnstra's (one-phase) algorithm. The 
performance of the various algorithms is compared in §8, and some refinements are mentioned 
in §9. 

2 Our unit of work 

The factorization algorithms which we consider use arithmetic operations modulo A^, where 
is the number to be factorized. We are interested in the case that N is large (typically 50 to 
200 decimal digits) so multiple-precision operations are involved. As our basic unit of work (or 
time) we take one multiplication modulo A^ (often just called "a multiplication" below). More 
precisely, given integers a, b in [0, A^), our unit of work is the cost of computing a*b mod A^. 
Because N is assumed to be large, we can simplify the analysis by ignoring the cost of additions 
mod N or of multiplications/divisions by smaU (i.e. single-precision) integers, so long as the 
total number of such operations is not much greater than the number of multiplications mod N. 
See [13, 20] for implementation hints. 

In some of the algorithms considered below it is necessary to compute inverses modulo N, 
i.e. given an integer a in (0, A^), compute u in (0, A^) such that a*u = 1 (mod A^). We write 
u = (mod A^). u can be computed by the extended GCD algorithm [13], which finds integers 
u and V such that au + Nv = g, where g is the GCD of a and N. We can always assume that 
5 = 1, for otherwise ^ is a nontrivial factor of N, and the factorization algorithm can terminate. 

Suppose that the computation of (mod A^) by the extended GCD algorithm takes the 
same time as K multiplications (mod N). Our first implementation gave AT ~ 30, but by using 
Lehmer's method [16] this was reduced to 6 < if < 10 (the precise value depending on the size 
of A^). It turns out that most computations of (mod N) can be avoided at the expense of 
about 8 multiplications (mod A"), so we shall assume that K = 8. 

Some of the algorithms require the computation of large powers (mod N), i.e. given a in [0, n) 

and 6 ^ 1, we have to compute (mod A"). We shall assume that this is done by the "binary" 
algorithm [13] which requires between log2 6 and 21og2 6 multiplications (mod A^) - on average 
say (3/2) log2 b multiplications (of which about log2 b are squarings). The constant 3/2 could be 
reduced slightly by use of the "power tree" or other sophisticated powering algorithms [13]. 

3 Prime factors of random integers 

In order to predict the expected running time of Lenstra's algorithm and our extensions of it, we 
need some results on the distribution of prime factors of random integers. Consider a random 
integer close to M, with prime factors ni > n2 > ■ ■ ■ ■ For a > 1, P > 1, define 

p(a) = lim Prob (m < M^/") 

and 

//(a,/3) = lim Prob (n2 < M^" and m < M^/"") . 
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(For a precise definition of "a random integer close to M", see [14]. It is sufficient to consider 
integers uniformly distributed in [1,M].) 

Several authors have considered the function p{a), see for example [7, 9, 13, 14, 18]. It 
satisfies a differential-difference equation 

ap'{a) + p{a — 1) = 
and may be computed by numerical integration from 



p{a) 



1 if < a < 1 

1 ra 



I hi:-iP(t)dt ifa>l. 
We shall need the asymptotic results 

Inp(a) = — a(lnQ; + In In a — 1) + o{a) (3-1) 

and 

p(a - l)/p{a) = a(ln a + 0(ln In a)) (3.2) 



as a — >■ oo. 



The function p{a, /3) is not so well-known, but is crucial for the analysis of the two-phase 
algorithms. Knuth and Trabb Pardo [14] consider p{a, 2) and by following their argument with 
trivial modifications we find that 

p{a,f3)=p{a) + r'Adt. (3.3) 

Ja-(^ a-t 

When comparing the two-phase and one-phase algorithms the ratio p{a)/p{a,P) is of interest, 
and we shall need the bound 

p{a) /n{a, P) = (in a{a In a)'^^ (3.4) 

as a — )• oo, for fixed /3 > 1. 

4 The group of an elliptic curve (mod p) 

In this section we consider operations mod p rather than mod N, and assume that p is a prime 
and p > 5. When applying the results of this section to factorization, p is an (unknown) prime 
factor of N, so we have to work mod N rather than mod p. 

Let S be the set of points {x, y) lying on the "elliptic curve" 

y^=x^ + ax + h (modp), (4.1) 
where a and h are constants, 4a^ -|- 276^ 7^ 0. Let 

G = S U {/}, 



where / is the "point at infinity" and may be thought of as (0, oc). Lcnstra's algorithm is based 
on the fact that there is a natural way to define an Abelian group on G. Geometrically, if Pi 
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and P2 G G, wc define P3 = P\*P2 by taking P3 to be the reflection in the z-axis of the point 
Q which lies on the elliptic curve (4.1) and is collinear with P\ and P2- Algebraically, suppose 
Pi = {xi, Ui) for i = 1, 2, 3. Then P3 is defined by: 

if Pi = / then P3 := P2 

else if P2 = / then P3 := Pi 

else if = (x2, -1/2) then P3 := / 

else 

begin 

if xi = X2 then A := (2?/i)~^(3xf + a) mod p 

else A := {xi - X2)~^{yi - 2/2) mod p; 
{A is the gradient of the line joining Pi and P2} 
X3 := (A^ — xi — X2) mod p; 
2/3 := (A(a;i - X3) - yi) mod p 
end. 

It is well-known that (G, *) forms an Abelian group with identity element /. Moreover, by the 
"Riemann hypothesis for finite fields" [12], the group order g = \G\ satisfies the inequality 

\g-p-l\ < 2Vp. (4.2) 

Lenstra's heuristic hypothesis is that, if a and b are chosen at random, then g will be essen- 
tially random in that the results of §3 will apply with M = p. Some results of Birch [3] suggest 
its plausibility. Nevertheless, the divisibility properties of g are not quite what would be ex- 
pected for a randomly chosen integer near p, e.g. the probability that g is even is asymptotically 
2/3 rather than 1/2. We shall accept Lenstra's hypothesis as we have no other way to predict 
the performance of his algorithm. Empirical results described in §8 indicate that the algorithms 
do perform roughly as predicted. 

Note that the computation of Pi*P2 requires (3 -|- K) units of work if Pi 7^ P2, and (4 -|- K) 
units of work if Pi = P2. (Squaring is harder than multiplication!) If we represent Pj as 
(xj/zj, yi/zi) then the algorithm given above for the computation of Pi*P2 can be modified to 
avoid GCD computations; assuming that zi = Z2 (which can usually be ensured at the expense 
of 2 units of work), a squaring then requires 12 units and a nonsquaring multiplication requires 
9 units of work. 

The reader who is interested in learning more about the theory of elliptic curves should 
consult [11], [12] or [15]. 

5 Lenstra's algorithm 

The idea of Lenstra's algorithm is to perform a sequence of pseudo-random trials, where each 
trial uses a randomly chosen elliptic curve and has a nonzero probability of finding a factor 
of N. Let m and m! be parameters whose choice will be discussed later. To perform a trial, first 
choose P = {x,y) and a at random. This defines an elliptic curve 

y'^=x^ + ax + b (mod iV) (5.1) 

(In practice it is sufficient for a to be a single-precision random integer, which reduces the cost 
of operations in G; also, there is no need to check if GCD (TV, 4a^ -|-276^) 7^ 1 as this is extremely 
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unlikely unless N has a small prime factor.) Next compute Q = P^, where E is & product of 
primes less than m, 

E= n Pi"' ' 

Pi prime, Pi<m 

where 

Si = [ln(m')/ln(pi)J. 

Actually, E is not computed. Instead, Q is computed by repeated operations of the form 
P := P^, where k = pf^ is a prime power less than m', and the operations on P are performed 
in the group G defined in §4, with one important difference. The difference is that, because 
a prime factor p of iV is not known, all arithmetic operations are performed modulo N rather 
than modulo p. 

Suppose initially that m' = N . If wc are lucky, all prime factors oi g = \G\ will be less 
than m, so g\E and P^ = I in the group G. This will be detected because an attempt to 
compute (mod N) will fail because GCD (A'", t) > 1. In this case the trial succeeds. (It may, 
rarely, find the trivial factor N if all prime factors of N are found simultaneously, but we neglect 
this possibility.) 

Making the heuristic assumption mentioned in §4, and neglecting the fact that the results of 
§3 only apply in the limit as M(or p) oo, the probability that a trial succeeds in finding the 
prime factor p of iV is just p(a), where a = ln(p)/ln(m). 

In practice we choose m' = m rather than m' = N, because this significantly reduces the 
cost of a trial without significantly reducing the probability of success. Assuming m' = m, 
well-known results on the distribution of primes [10] give ln(£^) ~ m, so the work per trial 
is approximately cim, where ci = ("^ + AT) 2 in 2 • Here ci is the product of the average work 
required to perform a multiplication in G times the constant which arises from our use of 
the binary algorithm for computing powers (see §2). Since m = the expected work to find 
p is 

Wi(a) - cipV"/p(a). (5.2) 

To minimise Wi(q!) we differentiate the right side of (5.2) and set the result to zero, obtaining 
ln(p) = —a^p'{a)/p{a), or (from the differential equation satisfied by p), 

ap(a — 1) , , 

Inp = ^\ ' . 5.3 
p{a) 

In practice p is not known in advance, so it is difficult to choose a so that (5.3) is satisfied. 
This point is discussed in §8. For the moment assume that we know or guess an approximation 
to log(p), and choose a so that (5.3) holds, at least approximately. Prom (3.2), 

Inp = a^(lna + O(lnlna)) , (5.4) 

so 



2 Inp 

and 

In Wi{a) ~ — lnp(Q;) ~ 2Q;lna A/21npln Inp . (5.6) 

p{a) 
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Thus 



Ti{p) = Wi{a) = exp (^^{2 + o(l)) Inplnlnp^ , (5.7) 



as stated in §1. It may be informative to write (5.7) as 

Ti (p) = Wi (a) = p2/°+o(i/«) ^ (5.8) 

so 2/ a is roughly the exponent which is to be compared with 1 for the method of trial division 
or 1/2 for the Brent-Pollard "rho" method. For lO^'' < p < 10^^, a is in the interval (3.2, 5.0). 

6 The "birthday paradox" two-phase algorithm 

In this section we show how to increase the probability of success of a trial of Lenstra's algorithm 
by the addition of a "second phase". Let m = p^/" be as in §5, and m' = > m. Let g be 
the order of the random group G for a trial of Lenstra's algorithm, and suppose that g has 
prime factors ni > n2 > . . . Then, making the same assumptions as in §5, the probability 
that rii < m' and n2 < w- is ij,(a,f3), where n is defined by (3.3). Suppose we perform a trial of 
Lenstra's algorithm, computing Q = as described in §5. With probability //(a, /3) — p{a) we 
have m < ni < m' and 77,2 < m, in which case the trial fails because Q ^ I, but = 7 in G. 
(As in §5, g should really be the order of P in G rather than the order of G, but this difference 
is unimportant and will be neglected.) 

Let H = (Q) be the cyclic group generated by Q. A nice idea is to take some pseudo-random 

function f:Q^Q, define Qo = Q and Qj+i = f{Qi) for f = 0, 1, . . . , and generate Qi, Q2, ■ ■ ■ 
until Q2i = Qi in G. As in the Brent-Pollard "rho" algorithm [5], we expect this to take 0{y/n^) 
steps. The only flaw is that we do not know how to define a suitable pseudo-random function /. 
Hence, we resort to the following (less efficient) algorithm. 



Define Qi = Q and 



Qj with probability 1/2, 
Q'j*Q with probability 1/2, 



for j = 1, 2, . . . , r — 1, so Qi,. . . ,Qr are essentially random points in H and are generated at 
the expense of 0(r) group operations. Suppose Qj = {xj,yj) and let 

r— 1 r 

d=I[ n (Vi-Vj) (mod AT) (6.1) 

i=l j=i+l 

If, for some i < j < r, Qi = Qj in G, then p\ (y^ — yj) so p\d and we can find the factor of p of 
by computing GCD (N, d). (We cannot find i and j by the algorithm used in the Brent-Pollard 
"rho" algorithm because Qi = Qj does not imply that Qi+i = Qj+i-) 

The probability that p\d is the same as the probability that at least two out of r people have 
the same birthday (on a planet with ni days in a year). For example, if ni = 365 and r = 23, 
the probability Pe = 1/2. 
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In general, for r <C ni, 



i=i 

so we see that > 1/2 if r ~ (2ni ln2)^/^ 



r-1 I 2 \ 

pe=\- n(i - 1 - ( - ^ j ' (^-2) 



We can obtain a good approximation to the behaviour of the "birthday paradox" algorithm 
by replacing the right side of (6.2) by a step function which is 1 if > 2niln2 and if 

< 2niln2. Thus, a trial of the "birthday paradox" algorithm will succeed with probability 
approximately iJ,{a,P), where P is defined by = 2m^ In 2, i.e. 

21n.-ln(21n2) 

mm 

and iJ.{a, P) is as in §3. A more precise expression for the probability of success is 

p(a)+r"|l-2V-''-"^/n^ dt. (6.4) 
Jo ^ > a — t 

Computation shows that (6.3) gives an estimate of the probability of success which is within 
10% of the estimate (6.4) for the values of p, a and ^ which are of interest, so we shall use (6.3) 
below (but the numerical results given in §8 were computed using (6.4)). 

A worthwhile refinement is to replace d of (6.1) by 

r— 1 r 

D=\{ W {xi-Xj) (modiV). (6.5) 

i=l j=i+l 

Since (xj, —yj) is the inverse of {xj,yj) in H, this refinement effectively "folds" H by identifying 
each point in H with its inverse. The effect is that (6.2) becomes 

Pe^l-expUM (6.6) 



and (6.3) becomes = In 2, i.e. 



21nr-lnln2 , , 

P = 6.7 

mm 



(6.4) still holds so long as P is defined by (6.7) instead of (6.3). 



7 The use of fast polynomial evaluation 

Let P{x) be the polynomial with roots xi, . . . ,Xr, i.e. 

P{x) = f[{x-xj) = J2ajx^ (mod TV) (7.1) 

j=l j=0 

and let M{r) be the work necessary to multiply two polynomials of degree r, obtaining a product 
of degree 2r. As usual, we assume that all arithmetic operations are performed modulo N, where 
N is the number which we are trying to factorize. 
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Because a suitable root of unity (mod N) is not known, we are unable to use algorithms based 
on the FFT [1]. However, it is still possible to reduce M(r) below the obvious O(r^) bound. For 
example, binary splitting and the use of Karatsuba's idea [13] gives M(r) = 0(r'°S2 3^_ 

The Toom-Cook algorithm [13] does not depend on the FFT, and it shows that 

M(r) = O (r^+(c/^nr)y^^ (7 2) 

as r — >■ oo, for some positive constant c. However, the Toom-Cook algorithm is impractical, so 
let us just assume that we use a polynomial multiplication algorithm which has 

M(r) = O {r^+') (7.3) 

for some fixed e in (0,1). Thus, using a recursive algorithm, we can evaluate the coefficients 
Oo, . . . , a,.-i of (7.1) in 0(M(r)) multiplications, and it is then easy to obtain the coefficients 
bj = {j + l)aj+i in the formal derivative P'{x) = T,bjX^ . 



Using fast polynomial evaluation techniques [4], we can now evaluate P'{x) at r points in 
time 0{M{r)). However, 

D' = fl P'{x,), (7.4) 

so we can evaluate and then GCD {N,D'^). 

Thus, we can perform the "birthday paradox" algorithm in time 0{m) + 0(r^^^) per trial, 
instead of 0{m,) + 0{r'^) if (6.5) is evaluated in the obvious way. To estimate the effect of this 
improvement, choose a as in §5 and /3 = 2/(l-|-e)so that each phase of the "birthday paradox" 
algorithm takes about the same time. From (3.4) we have 

/^(") ^ry( ^'^Q' \=n( ^'^^"^ ^ (7^\ 

fj.{a,p) V(alna)2/a+^)y' V(lnplnlnp)V{i+^)y' • ^'■^) 

Thus, for any e' > e, we have a speedup of at least order (Inp)-'^/^^^^') over Lenstra's algorithm. 
If we use (7.2) instead of (7.3) we obtain a speedup of order Inp in the same way. 

Unfortunately the constants involved in the "O" estimates make the use of "fast" polynomial 
multiplication and evaluation techniques of little value unless r is quite large. If r is a power 
of 2 and binary splitting is used, so e = log2 3 — 1 = 0.585 above, we estimate that can be 
evaluated in 8r^+^ + 0(r) time units, compared to + 0{r) for the obvious algorithm. Thus, 
the "fast" techniqTic may actually be faster if r > 2^°. From the results of §8, this occurs if 
p > 10^2 (approximately). 



8 Optimal choice of parameters 

In Table 1 we give the results of a numerical calculation of the expected work W required to 
find a prime factor p of a large integer A^, using four different algorithms: 

1. The Brent-Pollard "rho" algorithm [5], which may be considered as a benchmark. 

2. Lenstra's one-phase elliptic curve algorithm, as described in §5. 
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3. Our "birthday paradox" two-phase algorithm, as described in §6, with e = 1. 

4. The "birthday paradox" algorithm with e = 0.585, as described in §7, with r restricted to 
be a power of 2. 



logio P 


Alrr 1 
Aig. i 


A In- 

Alg. I 


A Irr 
Alg. 6 


A Irr A 
Alg. 4 




'^ 4Q 


A 67 


A OQ 


4.26 


8 


4.49 


5.38 


4.76 


4.91 


10 


5.49 


6.03 


5.39 


5.53 


12 


6.49 


6.62 


5.97 


6.07 


14 


7.49 


7.18 


6.53 


6.60 


16 


8.49 


7.71 


7.05 


7.12 


18 


9.49 


8.21 


7.56 


7.59 


20 


10.49 


8.69 


8.04 


8.05 


30 


15.49 


10.85 


10.22 


10.14 


40 


20.49 


12.74 


12.11 


11.97 


50 


25.49 


14.44 


13.82 


13.62 



Table 1: log^g W versus log^gP for Algorithms 1-4 



In all cases W is measured in terms of multiplications (mod N)^ with one extended GCD 
computation counting as 8 multiplications (see §2). The parameters a and /3 were chosen to 
minimize the expected value of W for each algorithm (using numerical minimization if neces- 
sary). The results are illustrated in Figure 1. 

Prom Table 1 we see that Algorithm 3 is better than Algorithm 1 for p ~ lO^'', while Al- 
gorithm 2 is better than Algorithm 1 for p ~ 10^^. Algorithm 3 is 4 to 4.5 times faster than 
Algorithm 2. Algorithm 4 is slightly faster than Algorithm 3 if p ~ 10^^. 

The differences between the algorithms appear more marked if we consider how large a factor 
p we can expect to find in a given time. Suppose that we can devote lO^'^ units of work to the 
factorization. Then, by interpolation in Table 1 (or from Figure 1), we see that the upper bounds 
on p for Algorithms 1, 2 and 3 are about 10^^, 10^^ and 10^^ respectively. 



logio P 


A B rn r T «'2i m /T 


S 


10 


3.72 1.56 484 104 12.1 0.64 40 


4.37 


20 


4.65 1.35 19970 669 147.5 0.47 135 


4.46 


30 


5.36 1.27 397600 2939 1141 0.44 348 


4.32 


Table 2: Optimal parameters for Algorithm 3 



In Table 2 we give the optimal parameters a, (3, m = p^/", r = {m^ In 2)^/^, 
T = expected number of trials (from (6.4)), m/T, W21 = (work for phase 2)/(work for phase 1), 

and S = speedup over Lenstra's algorithm, all for Algorithm 3 and several values of p. 

In order to check that the algorithms perform as predicted, we factored several large N with 
smallest prime factor p = 10^2. In Table 3 we give the observed and expected work to find 
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Algorithm Number of Observed work Expected work 

Factorizations per factor/10^ per factor/ 10® 

2 126 3.41 ± 0.30 4.17 

3 100 0.74 ± 0.06 0.94 

Tabk^ 3: 01")S(n'A"od \"(^rsus expcT'ted work per factor for Algorithms 2 3 



each factor by Algorithms 2 and 3. The agreement is reasonably good, considering the number 
of approximations made in the analysis. If anything the algorithms appear to perform slightly 
better than expected. 

In practice we do not know p in advance, so it is difficult to choose the optimal parameters 
a, P etc. There are several approaches to this problem. If we are willing to devote a certain 
amount of time to the attempt to factorize N, and intend to give up if we are unsuccessful after 
the given amount of time, then we may estimate how large a factor p we are likely to find (using 
Table 1 or Figure 1) and then choose the optimal parameters for this "worst case" p. Another 
approach is to start with a small value of m and increase m as the number of trials T increases. 
From Table 2, it is reasonable to take m/T = 135 if we expect to find a prime factor p ^ 10^0. 
Once m has been chosen, we may choose r (for Algorithms 3 or 4) so that W21 (the ratio of the 
work for phase 2 to the work for phase 1) has a moderate value. From Table 2, W21 = 0.5 is 
reasonable. In practice these "ad hoc" strategies work well because the total work required by 
the algorithms is not very sensitive to the choice of their parameters (e.g. if m is chosen too 
small then T will be larger than expected, but the product mT is relatively insensitive to the 
choice of m). 

9 Further refinements 

In this section we mention some further refinements which can be used to speed up the algorithms 
described in §§5-7. Details will appear elsewhere. 

9.1 Better choice of random points 

Let e > 1 be a fixed exponent, let bi and bi be random linear functions of i, ai = bf, cii = hi, 
and rs ~ . In the birthday paradox algorithm we may compute 

{xi,yi) = Q''' (z = l,...,r) 

and 

{xj,y.) = Q-^ (j = l,...,s) 

and replace (6.1) by 

s r 

j=l i=l 
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3 - 



2 - 
1 - 

H 1 1 1 1 1 1 

6 8 10 12 14 16 18 

Figure 1: logio ^ versus logioP foi" Algorithms 1-3 



Using e > 1 is beneficial because the number of solutions of a;^ = 1 (mod ni) is GCD (e, rii — 1). 
We take 6j and 6j to be linear functions of i so that the e-th differences of the and 
are constant, which allows the computation of xi, . . . ,Xr and xi, . . . ,Xs in 0((r + s)e) group 
operations. The values of xj do not need to be stored, so storage requirements are 0(r) even 
if s ^ r. Moreover, by use of rational preconditioning [22, 29] it is easy to evaluate (9.1) in 
(r + 0(logr))s/2 multiplications. Using these ideas we obtain a speedup of about 6.6 over the 
one-phase algorithm for p = 10^°. 

9.2 Other second phases 

Our birthday paradox idea can be used as a second phase for Pollard's — 1" algorithm [23]. 
The only change is that wc work over a different group. Conversely, the conventional second 
phases for Pollard's "p — 1" algorithm can be adapted to give second phases for elliptic curve 
algorithms, and various tricks can be used to speed them up [19]. Theoretically these algorithms 
give a speedup of the order loglog(p) over the one-phase algorithms, which is not as good as 
the log(p) speedup for the birthday paradox algorithm [6]. However, in practice, the speedups 
are comparable (in the range 6 to 8) . We prefer the birthday paradox algorithm because it does 
not require a large table (or on-line generation) of primes for the second phase, so it is easier to 
program and has lower storage requirements. 
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9.3 Better choice of random elliptic curves 

Montgomery [21] and Suyama [28] have shown that it is possible to choose "random" eUiptic 
curves so that g is divisible by certain powers of 2 and/or 3. For example, we have implemented 
a suggestion of Suyama which ensures that g is divisible by 12. This effectively reduces p to 
p/12 in the analysis above, so gives a speedup which is very significant in practice, although not 
significant asymptotically. 

9.4 Faster group operations 

Montgomery [21] and Chudnovsky and Chudnovsky [8] have shown that the Weierstrass normal 
form (5.1) may not be optimal if we are interested in minimizing the number of arithmetic 
operations required to perform group operations. If (5.1) is replaced by 

hy^ = x^ + ax^ + x (mod N) (9.2) 

then we can dispense with the y coordinate and compute in 10 log2 n + 0(1) multiplications 
(mod N), instead of about |(^ + K) log2 n multiplications (mod iV), a saving of about 43% if 

K = S. 

The effect of the improvements described in §§9.3-9.4 is to speed up both the one-phase and 
two-phase algorithms by a factor of 3 to 4. 

10 Conclusion 

Lcnstra's algorithm is currently the fastest known factorization algorithm for large having 
a factor p ~ 10^'\ hip/hiN <^ 1/2. It is also ideally suited to parallel computation, since the 
factorization process involves a number of independent trials which can be performed in parallel. 

We have described how to improve on Lenstra's algorithm by the addition of a second phase. 
The theoretical speedup is of order ln(p). From an asymptotic point of view this is not very 
impressive, but in practice it is certainly worth having and may increase the size of factors 
which can be found in a reasonable time by several orders of magnitude (see Figure 1 and the 
comments in §8). 

Given increasing circuit speeds and increasing use of parallelism, it is reasonable to predict 
that 10^^ multiplications might be devoted to factorizing a number in the not-too-far-distant 
future (there are about 3 x 10^^ microseconds in a year). Thus, from Table 1, it will be feasible 
to find prime factors p with up to about 50 decimal digits by the algorithms based on elliptic 
curves. Other algorithms [27] may be even more effective on numbers which are the product of 
two roughly equal primes. This implies that the composite numbers N on which the RSA public- 
key cryptosystem [25, 26] is based should have at least 100 decimal digits if the cryptosystem is 
to be reasonably secure. 
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Postscript and historical note (added 7 November 1998) 



The I^T^jX source file was retyped in 1998 from the version (rpbl02) which appeared in Proceed- 
ings of the Ninth Australian Computer Science Conference, special issue of Australian Computer 
Science Communications 8 (1986), 149-163 [submitted 24 September 1985, and in final form 
10 December 1985]. No attempt has been made to update the contents, but minor typographical 
errors have been corrected (for example, in equations (1.1), (6.3), (6.7) and (9.2)). Some minor 
changes have been made for cosmetic reasons, e.g. d' was changed to D in (6.5), and some 
equations have been displayed more clearly using the M^iK \f rac{. . .}{. . .} and \sqrt{. . .} 
constructs - see for example (5.5)-(5.7). 

A preliminary version (rpb097tr) appeared as Report CMA-R32-85, Centre for Mathematical 
Analysis, Australian National University, September 1985. It is more detailed but does not 
include the section on "further refinements" (§9 above). 

For developments up to mid- 1997, see: 

30. R. P. Brent, Factorization of the tenth Fermat number. Mathematics of Computation 
68 (January 1999), to appear (rpbl61). A preliminary version {Factorization of the tenth 
and eleventh Fermat numbers, Technical Report TR-CS-96-02, Computer Sciences Labo- 
ratory, ANU, February 1996) is also available in electronic form (rpbl61tr). 

Further remarks (added 3 December 1998) 

In the estimate (1.1), Ti{p) is the arithmetic complexity. The bit complexity is a factor AI{N) 
larger, where M{N) is the number of bit operations required to multiply integers mod As 
explained in §2, we take one multiplication mod N as the basic unit of work. In applications 
such as the factorization of large Fermat numbers, the factor M{N) is significant. 

In §4 the group operation on the elliptic curve is written as multiplication, because of the 
analogy with the Pollard "p— 1" method. Nowadays the group operation is nearly always written 
as addition, see for example [30]. 

At the end of §8, we say that "the product mT is relatively insensitive to the choice of m" . 
See [30, Table 3] for an indication of how the choice of non-optimal m changes the efficiency of 
the method. 
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